In the last exercise, we exploited some regression models for our analysis. That was a good start, but publications need tables of results and fancy visualizations, e.g., predictions.

Again, we load our GESIS Panel COVID-19 data and also - just to make sure - re-run the regression models:

library(dplyr)
library(haven)
library(sjlabelled)

gp_covid <-
  read_sav(
    "../data/ZA5667_v1-1-0.sav"
  ) %>% 
  set_na(na = c(-1:-99, 97)) %>% 
  mutate(
    likelihood_hospital = hzcy003a,
    age_cat = as.factor(age_cat),
    likelihood_hospital_cut =
      ifelse(
        likelihood_hospital > median(likelihood_hospital, na.rm = TRUE),
        1,
        0
      )
  ) %>% 
  remove_all_labels()

serious_logistic_regression <- 
  glm(
    likelihood_hospital_cut ~ age_cat + sex + political_orientation + marstat,
    data = gp_covid,
    family = binomial(link = "logit")
  )

serious_cauchit_regression <- 
  glm(
    likelihood_hospital_cut ~ age_cat + sex + political_orientation + marstat,
    data = gp_covid,
    family = binomial(link = "cauchit")
  )

In the lecture, we started with the visualization of predictions. However, it is good practice to first build a regression table that shows, for example, the statistical significance of regression coefficients.

1

Build a stargazer regression table of both models to compare them. Use the option type = "text" to have it printed in the console. Moreover, choose a table style for the journal where you want to publish.
You can find an overview of all supported style with ?stargazer::`list of supported styles`.
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
stargazer(
  serious_logistic_regression,
  serious_cauchit_regression,
  type = "text",
  style = "asr"
)
## 
## -----------------------------------------------
##                        likelihood_hospital_cut 
##                        logistic  glm: binomial 
##                                  link = cauchit
##                          (1)          (2)      
## -----------------------------------------------
## age_cat2                0.557        1.036     
## age_cat3                0.224        0.503     
## age_cat4                0.855*       1.428     
## age_cat5               1.019**       1.605     
## age_cat6               1.202**       1.793*    
## age_cat7               1.525***      2.073*    
## age_cat8               1.737***     2.246**    
## age_cat9               1.915***     2.386**    
## age_cat10              1.957***     2.420**    
## sex                     0.069        0.046     
## political_orientation   0.025        0.018     
## marstat                 -0.001       -0.005    
## Constant              -2.153***     -2.548**   
## N                       3,105        3,105     
## Log Likelihood        -1,944.178   -1,944.539  
## AIC                   3,914.355    3,915.078   
## -----------------------------------------------
## *p < .05; **p < .01; ***p < .001

You know what: reviewer 2 is post-significant, at least a little. He requested that you should please use confidence intervals at a level of .93.

2

Rebuild your regression table and add the confidence intervals with the requested level.
Have a look at the help page for all possible options ?stargazer?. Moreover, it may be that CIs are still not shown due to the custom style.
library(stargazer)

stargazer(
  serious_logistic_regression,
  serious_cauchit_regression,
  type = "text",
  ci = TRUE,
  ci.level = .93
)
## 
## =======================================================
##                              Dependent variable:       
##                       ---------------------------------
##                            likelihood_hospital_cut     
##                           logistic      glm: binomial  
##                                         link = cauchit 
##                             (1)              (2)       
## -------------------------------------------------------
## age_cat2                   0.557            1.036      
##                       (-0.168, 1.282)  (-0.551, 2.623) 
##                                                        
## age_cat3                   0.224            0.503      
##                       (-0.514, 0.961)  (-1.151, 2.157) 
##                                                        
## age_cat4                  0.855**           1.428*     
##                        (0.157, 1.553)  (-0.116, 2.972) 
##                                                        
## age_cat5                  1.019***          1.605*     
##                        (0.323, 1.714)   (0.068, 3.142) 
##                                                        
## age_cat6                  1.202***         1.793**     
##                        (0.516, 1.887)   (0.264, 3.322) 
##                                                        
## age_cat7                  1.525***         2.073**     
##                        (0.865, 2.185)   (0.555, 3.591) 
##                                                        
## age_cat8                  1.737***         2.246***    
##                        (1.059, 2.414)   (0.723, 3.769) 
##                                                        
## age_cat9                  1.915***         2.386***    
##                        (1.235, 2.594)   (0.863, 3.909) 
##                                                        
## age_cat10                 1.957***         2.420***    
##                        (1.281, 2.634)   (0.898, 3.942) 
##                                                        
## sex                        0.069            0.046      
##                       (-0.072, 0.210)  (-0.079, 0.171) 
##                                                        
## political_orientation      0.025            0.018      
##                       (-0.013, 0.062)  (-0.015, 0.051) 
##                                                        
## marstat                    -0.001           -0.005     
##                       (-0.088, 0.086)  (-0.078, 0.068) 
##                                                        
## Constant                 -2.153***        -2.548***    
##                       (-2.875, -1.430) (-4.088, -1.009)
##                                                        
## -------------------------------------------------------
## Observations               3,105            3,105      
## Log Likelihood           -1,944.178       -1,944.539   
## Akaike Inf. Crit.        3,914.355        3,915.078    
## =======================================================
## Note:                       *p<0.1; **p<0.05; ***p<0.01

The previous exercise might also show that such integrated packages like stargazer have limits. Because of these limits, standardized procedures of gathering results like broom are quite attractive as packages like huxtable ‘understand’ this standard format.

We’ll leave regression tables now and turn to the plotting of the results. You should know that reviewer 2 wants you to do that after the third review round…

3

Draw a prediction plot of the variable “age_cat” for both models using the sjPlot package. Also, try to draw them side by side.
You can use the patchwork package for creating the joined plot.
library(sjPlot)
library(patchwork)

logit_plot <- 
  plot_model(
    serious_logistic_regression,
    type = "pred",
    terms = "age_cat"
    )

cauchit_plot <- 
  plot_model(
    serious_cauchit_regression,
    type = "pred",
    terms = "age_cat"
    )

logit_plot | cauchit_plot

Reviewer 2 is happy now. But Reviewer 1, who is more trusted by the editor, thinks estimating predictions based on ‘real’ average marginal effects (AME) is more cutting-edge. She also feels that you have proven that choosing a logit or a cauchit link doesn’t make a difference. Therefore she’d prefer to see only one plot for the cauchit model as this is not a methodological paper.

4

Draw a prediction plot of the variable “age_cat” only for the cauchit model, but this time base on AME.
The margins package and its cplot() function are your friend.
library(margins)

cplot(
  serious_cauchit_regression,
  "age_cat",
  what = "prediction"
)
##    xvals     yvals     upper      lower
## 1      1 0.1254915 0.2024187 0.04856432
## 2      2 0.2009891 0.2582776 0.14370054
## 3      3 0.1541380 0.2038258 0.10445029
## 4      4 0.2538967 0.3066802 0.20111319
## 5      5 0.2855169 0.3404940 0.23053976
## 6      6 0.3254362 0.3782727 0.27259970
## 7      7 0.3982705 0.4316913 0.36484958
## 8      8 0.4502259 0.5030795 0.39737228
## 9      9 0.4943734 0.5501290 0.43861782
## 10    10 0.5051618 0.5581895 0.45213405

Bonus: You may have seen that cplot() outputs the actual prediction data automatically. This was annoying in the slides, but you could use this output to draw a plot in ggplot2. Moreover, when you toggle the option draw = FALSE, then only the data are created.

BONUS

Draw a prediction plot of the variable “age_cat” only for the cauchit model based on AME in ggplot2.
  1. The column “xvals” is stored as character. You should convert it into numeric.
  2. There’s a geom called geom_errorbar which you can use for the confidence intervals
library(ggplot2)

prediction_data <-
  cplot(
  serious_cauchit_regression,
  "age_cat",
  what = "prediction",
  draw = FALSE
) %>% 
  dplyr::mutate(
    xvals = as.numeric(xvals)
  )
##    xvals     yvals     upper      lower
## 1      1 0.1254915 0.2024187 0.04856432
## 2      2 0.2009891 0.2582776 0.14370054
## 3      3 0.1541380 0.2038258 0.10445029
## 4      4 0.2538967 0.3066802 0.20111319
## 5      5 0.2855169 0.3404940 0.23053976
## 6      6 0.3254362 0.3782727 0.27259970
## 7      7 0.3982705 0.4316913 0.36484958
## 8      8 0.4502259 0.5030795 0.39737228
## 9      9 0.4943734 0.5501290 0.43861782
## 10    10 0.5051618 0.5581895 0.45213405
ggplot(
  prediction_data,
  aes(x = xvals, y = yvals)
) +
  geom_point() +
  geom_errorbar(aes(ymin = lower, ymax = upper), width = .1) +
  theme_bw()